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DETAILED ACTION 
Response to Amendment 

1 . The amendments to the claims have been entered. Claims 1 , 7, 9, 10, 24, 30, 
32, 33, 36, and 38 are currently amended, and claims 6 and 29 are currently canceled. 

Response to Arguments 

2. Applicant's arguments filed July 1 1 , 2005 have been fully considered but they are 
not persuasive. 

Regarding the argument that Stanford et al. do not disclose semantic processing 
(page 10, lines 6-8), it is noted that while the specification refers to a "semantic 
representation" of the spoken answers given by a user, there is no clear definition of 
what comprises the claimed "semantic representation". Further, the specification states 
that the semantic representation can be used by a "transaction initiator". This suggests 
that the semantic representation is nothing more than the command representation 
used by the computer (e.g. a call to a computer software function in computer code) that 
corresponds with the user's verbal input. That is, if the user verbally requests a certain 
service (e.g. retrieving the user's email), the "semantic representation" is the actual 
function call to implement that service. Without any specific definition of "semantic 
representation" given, the Examiner has interpreted the term to mean a representation 
of the meaning of the user's verbal input . Equivalently, Stanford et al. disclose that 
recognition server (Fig. 1 108) communicates with user applications (110, column 9, 
lines 51-53). The communication is to perform services in the user applications that 
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were verbally requested by the users, thus the communications between the recognition 
server 108 and the user applications 1 10 is a "semantic representation" of the user's 
verbal input and the recognition server 108 "converts a syntactic message to a semantic 
message" as claimed. 

Regarding the argument that the thrust of Stanford et al. is how to accommodate 
speech recognition within the resources of a single computer or microprocessor (page 
10, lines 8-9), the Applicant has relied on a brief statement by Stanford et al. that a goal 
of the invention is to provide a speech recognizer with a minimum memory requirement. 
This is not persuasive because minimizing memory requirements is desirable 
regardless of the implementation. Further, although the Applicant has alleged that the 
architecture works on a single computer (without any indicated support from the 
reference), Stanford et al. specifically states that the system is organized around 
speech recognition functions being deployed as speech recognition servers (column 9, 
line 66 to column 10, linel). 

Regarding the argument that is "far from trivial or obvious how to make any 
specific tasks work over multiple computers in a network" (page 10, lines 13-15), as 
indicated above, Stanford et al. suggests performing the function over multiple 
computers in a network (servers). Furthermore, the disclosure of Stanford et al. 
recognizes the separation of different functions (such as converting a voice data 
message to a phonetic data message) by implementing each function as separate 
blocks or objects. As disclosed by Christensen et al., once a function is implemented as 
a distinct object, implementing the objects remotely provide the advantages of allowing 
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the physical performance and administration needs of a computer system to be fully 
addressed without having to give up the logical model (column 13, lines 63-67). 
Furthermore, the remote automation disclosed by Christensen et al. allows existing 
applications to be implemented remotely without modifying the existing applications 
(column 2, line 64 to column 3, line 6). 

In response to applicant's arguments against the references individually (that 
Christensen et al. do not disclose providing physical performance or administration of a 
speech recognition system, page 11, lines 305), one cannot show nonobviousness by 
attacking references individually where the rejections are based on combinations of 
references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & 
Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). That is, it is the combination of 
Stanford et al. and Christensen that disclose a remote architecture for speech 
recognition, therefore the teachings of Christensen et al. (administration and physical 
performance), as applied to the combination, would be for a speech recognition system. 

3. Additionally, the reference to Ekrot et al. (U.S. Patent 5,675,723 ) has been 
corrected herein and indicated on the Notice of References Cited form included with this 
action. 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 
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(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 14-18, 22, and 23 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Stanford et al. (U.S. Patent 5,615,296), in view of Christensen et al. 
(U.S. Patent 5,881,230). 

In regard to claim 14, Stanford et al. disclose a method of processing speech 
comprising: 

receiving, at a first server object, a voice data message from a telephone network 
(low bandwidth telephony voice data stream, column 8, lines 22-25); 

transmitting said voice data message to a second server object (connection 
between 100 and 102); 

converting said voice data message to a phonetic data message in said second 
server object (vector quantization block 104 uses Cepstral coefficients converted from 
the input speech, column 8 line 65 to column 9 line 4; to select the closest codebook 
values, each codebook value representing phonetic data, column 9, lines 38-48); 

transmitting said phonetic data message from said second server object to a third 
server object over said first computer network (connection between 104 and 106) 

converting said phonetic data message to a syntactic data message in said third 
server object (phonetic time series is converted to word sequences, column 9, lines 49- 
51); 

transmitting said syntactic data message from said third server object to a fourth 
sever object over said first computer network (connection between 106 and 108); and 
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converting said syntactic data message to a semantic data message, 
representative of said voice data message in said fourth server object (recognition 
server converts word sequences to communicate with user applications, column 9, lines 
51-53); 

Furthermore, Stanford et al. disclose the architecture is independent of hardware 
configurations (column 7, lines 56-57) and implemented for different levels of operation 
over a communications network (column 9, lines 58-65). 

Stanford et al. do not explicitly disclose that the transmissions are sent over a 
first computer network. 

Christensen et al. disclose a system for remote objects to communicate over a 
computer network (Fig. 4 and column 9, lines 10-17). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. so the first (connection between 100 and 102), 
second (connection between 104 and 106), and third (connection between 106 and 
108) connections were formed over a first computer network, as disclosed by 
Christensen et al., since remote automation allows the physical performance and 
administration needs of a computer system to be fully addressed without having to give 
up the logical model, as taught by Christensen et al. (column 13, lines 63-67). 

In regard to claim 15, Stanford et al. disclose said fourth server object (108) is 
coupled to a second computer network (connection between 108 and 110) for receiving 
an application code from a client (110) of said second computer network, said 
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application code providing control data for the operation of said speech recognition 
system (control data is sent from user applications to request the services of the speech 
recognition system, column 10, lines 13-18). 

In regard to claim 16, the combination of Stanford et al. and Christensen et al., as 
applied to claim 14, above, discloses in Christensen et al. said first computer network is 
one of a local area network and the internet (column 14, lines 44-47). 

In regard to claim 17, Stanford et al. disclose said second computer network is 
one of a local area network and the internet (remote procedure calls must necessarily 
occur over a local area network, column 10, lines 21-25). 

In regard to claim 18 Stanford et al. do not disclose said first, second and third 
connections are formed from named pipes. 

Christensen et al. disclose making connections formed from named pipes 
(column 9, lines 49-51). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. to form the connections from named pipes, since a 
named pipe can be used by processes that do not have to share a common process 
origin and the message sent to the named pipe can be read by any authorized process 
that knows the name of the named pipe, which allows simple method of communicating 
between unrelated processes. 
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In regard to claim 22, Stanford et al. do not disclose that the server objects are 
configured by said according to the Distributed Component Object Model (DCOM). 

Christensen et al. disclose that the server objects are configured by the 
Distributed Component Object Model (column 12, lines 50-54). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. to configure the objects according to the DCOM, 
since DCOM allows processes to transparently send and receive information so that 
processing can easily be assigned to different servers as processing resources become 
available. 

In regard to claim 23, Stanford et al. disclose processing said semantic data 
message in said fourth server object according to said application code (user 
applications 110 request the services of the recognition server 108, column 10, lines 13- 
15). 

6. Claims 1-13, 19-21 and 24-38 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Stanford et al. (U.S. Patent 5,615,296), in view of Christensen et al. 
(U.S. Patent 5,881 ,230), and further in view of Ekrot et al. (U.S. Patent 5,675,723). 

In regard to claims 1 and 24, Stanford et al. disclose a speech recognition 
system (Fig. 1) and method comprising: 

a line of service including: 



Application/Control Number: 09/815,808 Page 9 

Art Unit: 2655 

a first server object (100) coupled to a telephone network for receiving a voice 
data message from said telephone network (low bandwidth telephony voice data 
stream, column 8, lines 22-25); 

a second server object (front end comprising blocks 102 and 104) having a first 
connection (connection between 100 and 102) to said first server object (100) for 
receiving said voice data message from said first server object and converting said 
voice data message to a phonetic data message (vector quantization block 104 uses 
Cepstral coefficients converted from the input speech, column 8 line 65 to column 9 line 
4; to select the closest codebook values, each codebook value representing phonetic 
data, column 9, lines 38-48); 

a third server object (106) having a second connection (connection between 104 
and 106) to said second server object (front end comprising blocks 102 and 104) for 
receiving said phonetic data message (phonetic time series from front end) from said 
second server object and converting said phonetic data message to a syntactic data 
message (phonetic time series is converted to word sequences, column 9, lines 49-51); 
and 

a fourth server object (108) having a third connection (connection between 106 
and 108) to said third server object (106) for receiving said syntactic data message 
(word sequence) from said third server object and converting said syntactic data 
message to a semantic data message, which is representative of said voice data 
message (recognition server converts word sequences to communicate with user 
applications, column 9, lines 51-53); 



Application/Control Number: 09/815,808 Page 10 

Art Unit: 2655 

Furthermore, Stanford et al. disclose the architecture is independent of hardware 
configurations (column 7, lines 56-57) and implemented for different levels of operation 
over a communications network (column 9, lines 58-65). 

Stanford et al. do not explicitly disclose that the first, second, and third 
connections are formed over a first computer network, 

Christensen et al. disclose a system for remote objects to communicate over a 
computer network (Fig. 4 and column 9, lines 10-17). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. so the first (connection between 100 and 102), 
second (connection between 104 and 106), and third (connection between 106 and 
108) connections were formed over a first computer network, as disclosed by 
Christensen et al., since remote automation allows the physical performance and 
administration needs of a computer system to be fully addressed without having to give 
up the logical model, as taught by Christensen et al. (column 13, lines 63-67). 

Furthermore, in regard to claim 24, Christensen et al. disclose the physical 
layering can be changed without changing the logical model (column 14, lines 10-12). 
For the same reasons as given above, therefore, it would have been obvious to one of 
ordinary skill in the art at the time of invention to further modify Stanford et al. to 
combine both the conversion of the voice data message to a phonetic data message 
and the conversion of the phonetic data message to a syntactic data message in a 
single speech recognition server. 



Application/Control Number: 09/81 5,808 Page 1 1 

Art Unit: 2655 

The combination of Stanford et al. and Christensen et al. do not disclose a 
control monitor for controlling the configuration of said first, second, third and fourth 
server objects in said line of service. 

Ekrot et al. disclose a control monitor (Fig. 3, backup server 200) that controls 
the configuration of server objects (if one of primary servers 202 and 204 fails, the 
backup server 200 acts as a server in place of the server that failed, column 5, lines 19- 
21 and lines 37-42). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify the combination of Stanford et al. and Christensen to include 
a control monitor for controlling the configuration of said first, second, third and fourth 
server objects in said line of service, in order to keep the system functional if one of the 
server objects failed without taking the system offline. 

In regard to claims 2 and 25, Stanford et al. disclose said fourth server object 
(108) is coupled to a second computer network (connection between 108 and 110) for 
receiving an application code from a client (110) of said second computer network, said 
application code providing control data for the operation of said speech recognition 
system (control data is sent from user applications to request the services of the speech 
recognition system, column 10, lines 13-18). 

In regard to claims 3 and 26, the combination of Stanford et al. and Christensen 
et al., as applied to claims 1 and 24, above, discloses in Christensen et al. said first 
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computer network is one of a local area network and the internet (column 14, lines 44- 
47). 

In regard to claims 4 and 27, Stanford et al. disclose said second computer 
network is one of a local area network and the internet (remote procedure calls must 
necessarily occur over a local area network, column 10, lines 21-25). 

In regard to claims 5 and 28 Stanford et al. do not disclose said first, second and 
third connections are formed from named pipes. 

Christensen et al. disclose making connections formed from named pipes 
(column 9, lines 49-51). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. to form the connections from named pipes, since a 
named pipe can be used by processes that do not have to share a common process 
origin and the message sent to the named pipe can be read by any authorized process 
that knows the name of the named pipe, which allows simple method of communicating 
between unrelated processes. 

In regard to claims 7-9 and 30-32, Stanford et al. and Christensen et al. do not 
disclose periodically transmitting a status signal to said system monitor. 

Ekrot et al. disclose server objects that periodically transmit a status signal to 
said system monitor, wherein the transmission of said periodic status signal from said 
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server objects to said system monitor indicates that said server objects is operational, 
and wherein a nontransmission of said periodic signal indicates that one of said server 
objects is disabled (a heartbeat signal is sent from primary servers 202 and 204 to 
standby server 200, and when the standby server 200 ceases to receive signals from 
one of the primary servers 202 and 204, one of the primary servers has failed, column 
5, lines 30-37; Furthermore, when one of the primary servers 202 and 204 has failed, 
the standby server 200 acts as a backup for the primary servers, column 5, lines 37-42). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify the combination of Stanford et al., and Christensen et al. to 
periodically transmit a status signal to the system monitor and to include backup objects 
for objects that stop sending status signals, so the system as a whole would continue to 
function even if one of the server objects failed. 

In regard to claims 10 and 33, Stanford et al. do not disclose that the server 
objects are configured by said according to the Distributed Component Object Model 
(DCOM). 

Christensen et al. disclose that the server objects are configured by the 
Distributed Component Object Model (column 12, lines 50-54). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. to configure the objects according to the DCOM, 
since DCOM allows processes to transparently send and receive information so that 
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processing can easily be assigned to different servers as processing resources become 
available. 

In regard to claims 1 1 and 34, the Stanford et al. do not disclose . that each 
server object includes a post office for addressing and routing messages through the 
line of service. 

Christensen et al. disclose that each server object include a post office (RA proxy 
object application 68) for addressing and routing messages through the line of service 
(RA proxy 68 uses the network address of the remote computer to route the locally 
called object to the remote computer, column 9, lines 21-36). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. to include a post office for addressing and routing 
messages, since a post office manages the ordering a packaging of data to suit the 
particular network link and protocol, as taught by Christensen et al. (column 10, lines 
27-35). 

In regard to claims 12 and 35, Stanford et al. disclose additional lines of service 
connected between said telephone network and said second computer network (several 
versions of the recognition server are run, column 10, lines 50-52). 
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In regard to claim 13, Stanford et al. disclose the architecture is independent of 
hardware configurations (column 7, lines 56-57) and implemented for different levels of 
operation over a communications network (column 9, lines 58-65). 

Christensen et al. disclose a system for remote objects to communicate over a 
computer network (Fig. 4 and column 9 t lines 10-17). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. make the voice, acoustic, symbolic and task server 
objects, as well as the voice server object, speech recognition server, and task server 
object remote to each other, since remote automation allows the physical performance 
and administration needs of a computer system to be fully addressed without having to 
give up the logical model, as taught by Christensen et al. (column 13, lines 63-67). 

In regard to claim 19, the combination of Stanford et al. and Christensen et al. do 
not disclose a control monitor for controlling the configuration of said first, second, third 
and fourth server objects in said line of service. 

Ekrot et al. disclose a control monitor (Fig. 3, backup server 200) that controls 
the configuration of server objects (if one of primary servers 202 and 204 fails, the 
backup server 200 acts as a server in place of the server that failed, column 5, lines 19- 
21 and lines 37-42). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify the combination of Stanford et al. and Christensen to include 
a control monitor for controlling the configuration of said first, second, third and fourth 
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server objects in said line of service, in order to keep the system functional if one of the 
server objects failed without taking the system offline. 

In regard to claims 20-21, the combination of Stanford et al., and Christensen et 
al. as applied to claim 14, does not disclose periodically transmitting a status signal to 
said system monitor. 

Ekrot et al. disclose server objects that periodically transmit a status signal to 
said system monitor, wherein the transmission of said periodic status signal from said 
server objects to said system monitor indicates that said server objects is operational, 
and wherein a nontransmission of said periodic signal indicates that one of said server 
objects is disabled (a heartbeat signal is sent from primary servers 202 and 204 to 
standby server 200, and when the standby server 200 ceases to receive signals from 
one of the primary servers 202 and 204, one of the primary servers has failed, column 
5, lines 30-37; Furthermore, when one of the primary servers 202 and 204 has failed, 
the standby server 200 acts as a backup for the primary servers, column 5, lines 37-42). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify the combination of Stanford et al., and Christensen et al. to 
periodically transmit a status signal to the system monitor and to include backup objects 
for objects that stop sending status signals, so the system as a whole would continue to 
function even if one of the server objects failed. 
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In regard to claim 36, Stanford et al. disclose the speech recognition server (the 
combination of the front end and back end) an acoustic server object for receiving said 
voice data message from said voice server object and converting said voice data 
message to said phonetic data message (vector quantization block 104 uses Cepstral 
coefficients converted from the input speech, column 8 line 65 to column 9 line 4; to 
select the closest codebook values, each codebook value representing phonetic data, 
column 9, lines 38-48) and a symbolic server object (106) for receiving said phonetic 
data message from said acoustic server object and converting said phonetic data 
message to said syntactic data message (phonetic time series is converted to word 
sequences, column 9, lines 49-51). 

In regard to claims 37 and 38, Stanford et al. disclose the architecture is 
independent of hardware configurations (column 7, lines 56-57) and implemented for 
different levels of operation over a communications network (column 9, lines 58-65). 

Stanford et al. do not explicitly disclose the voice, acoustic, symbolic and task 
server objects, as well as the voice server object, speech recognition server, and task 
server object are remote to each other. 

Christensen et al. disclose a system for remote objects to communicate over a 
computer network (Fig. 4 and column 9, lines 10-17). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Stanford et al. make the voice, acoustic, symbolic and task server 
objects, as well as the voice server object, speech recognition server, and task server 
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object remote to each other, since remote automation allows the physical performance 
and administration needs of a computer system to be fully addressed without having to 
give up the logical model, as taught by Christensen et al. (column 13, lines 63-67). 
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